A More Robust Approach for Piano Transcription

Robust Approach for Piano Transcription

Piano transcription is a difficult task, and human performances of piano music can vary widely. This is due to the innate flexibility of the instrument and the fact that musicians often make adaptations when playing a piece that they did not write, such as transposing string quartet or orchestral pieces into piano concertos; rearranging chorales into piano pieces; and adapting arrangements of piano pieces by other composers for use with other instruments.

The difficulty of piano transcription has made it a focus of research in computer audition and music information retrieval for decades. The goal of these studies is to create a system that can accurately transcribe a musical score without human supervision, although such systems are still far from capable of matching the performance capabilities of a trained musician and are not useful for practical purposes.

https://www.tartalover.net/

A typical approach to piano transcription uses an end-to-end neural network for learning to recognize notes and their onset times, following the idea that musical information can be represented as a continuous stream of partials (or, in technical terms, frequency waves). This information can then be segmented into individual note units, which are then expressed in a familiar music-notation-like format. The onset times for each of these note units can then be predicted using the regressive model, yielding a sequence of piano notes that can be played back at any time, as if it were an audio recording of the original music.

A More Robust Approach for Piano Transcription

One of the main problems with this kind of approach is that it is prone to overfitting to the timbres and other features of the instrument in training data, which can lead to poor generalization on other recordings. Some recent work has addressed this problem by incorporating “data augmentation,” i.e., adding small amounts of random pitch shift to the training data, in order to reduce the likelihood of overfitting. Other strategies for improving generalization include incorporating a dual objective of predicting both the note and frame activity, as well as training on more diverse annotated real data sets.

While the current state-of-the-art models can achieve very good results, they are not yet robust enough to be useful for practical applications such as sight reading. A more robust approach is needed, and the research described in this paper demonstrates promising steps in that direction.

It is worth noting that many pianists, even advanced ones, struggle with sight reading. Fortunately, there are ways to improve your ability: Practice in small chunks, perhaps only a bar or two at a time; study the music 20% slower than you think you need before you start; and learn to read with your hands, not your eyes. It is difficult to play a passage successfully if you are constantly flitting between the score and your keyboard. The best sight readers, like the best musicians, see music in patterns or clusters or groupings. Schenkerian music theorists can distill a whole symphony movement down to just three notes, so this principle is clearly applicable to pianists!

Leave a Reply

Your email address will not be published. Required fields are marked *